Scope fastokens adaptation to individual tokenizers by xeophon · Pull Request #91 · PrimeIntellect-ai/renderers

xeophon · 2026-06-21T10:27:26Z

Overview

Load tokenizers through vanilla Transformers first, then replace only the returned tokenizer's backend with the same fastokens compatibility shim used by the global patch path.

This removes the process-wide patch window from renderers.load_tokenizer while preserving the fastokens-backed tokenizer returned to renderer callers.

Why

The previous path called fastokens.patch_transformers() around the complete AutoTokenizer.from_pretrained operation. Tokenizer loading can take hundreds of milliseconds or longer, and the patch mutates Transformers classes and decoder state process-wide for that entire interval. Unrelated tokenizer loads running concurrently could therefore receive fastokens unexpectedly, including its lack of offset mappings.

Per-instance adaptation keeps Transformers globals vanilla throughout the load. Concurrent renderer-pool slots still receive independent fastokens backends, while environment, harness, and application tokenizer calls remain unaffected.

Behavior

Fastokens encode/decode parity and the existing incompatible-model denylist remain unchanged.
use_fastokens=False, trusted revision pins, tokenizer mirrors, and renderer auto-resolution retain their existing behavior.
Unsupported fastokens backends return the already-loaded vanilla tokenizer instead of loading it a second time.
Offset-aware tokenizers now load directly through vanilla Transformers; the global-patch race workaround is no longer needed.
Fast-path announcement remains once per process under concurrent pool construction.

This narrower tokenizer ownership contract allows async consumers such as Verifiers to initialize renderer pools in worker threads without exposing temporary Transformers mutations to other coroutines.

Note

Medium Risk
Touches the default tokenizer load path used by renderer pools (performance and thread-safety contract), but behavior is intentionally preserved with added concurrency coverage.

Overview
Replaces process-wide fastokens.patch_transformers() with per-instance backend adaptation in load_tokenizer: tokenizers load through vanilla AutoTokenizer first, then only the returned object's _tokenizer is wrapped with fastokens._compat._TokenizerShim.

This removes the global patch window (and related stdout suppression / patch lock) that could affect concurrent AutoTokenizer.from_pretrained calls during slow loads. Encode/decode parity, FASTOKENS_INCOMPATIBLE, use_fastokens=False, and trusted-revision behavior stay the same; adaptation failures now keep the already-loaded vanilla tokenizer instead of reloading.

Offset-tokenizer loading in _get_offset_tokenizer drops the patch/unpatch race workaround and always uses vanilla Transformers, since fastokens no longer mutates globals.

Tests add a concurrent “slow load” case and refresh wording from “patch” to “adaptation.”

^{Reviewed by Cursor Bugbot for commit 7a670c5. Bugbot is set up for automated code reviews on this repo. Configure here.}

Note

Scope fastokens adaptation to individual tokenizers instead of globally patching transformers

Replaces global fastokens.patch_transformers() / unpatch calls with per-tokenizer backend swapping via a new _adapt_tokenizer_with_fastokens function in renderers/base.py.
load_tokenizer now loads via vanilla AutoTokenizer first, then swaps only the returned tokenizer's _tokenizer backend to the fastokens _TokenizerShim; other AutoTokenizer calls elsewhere in the process are unaffected.
_get_offset_tokenizer no longer toggles any fastokens global state; it loads a vanilla fast tokenizer directly.
A one-time INFO log per process (guarded by _FASTOKENS_ANNOUNCE_LOCK) replaces the previous per-load stdout redirect.
Behavioral Change: concurrent AutoTokenizer.from_pretrained calls made while load_tokenizer is in progress now always get a vanilla tokenizer, eliminating the previous race window introduced by global patching.

^{Macroscope summarized 7a670c5.}

macroscopeapp · 2026-06-21T10:29:14Z

Approvability

Verdict: Needs human review

This PR refactors the fastokens integration from process-wide patching to per-tokenizer adaptation, changing how core tokenizer loading infrastructure operates. While the change simplifies code and improves isolation, modifications to how this optimization is applied warrant human review.

^{You can customize Macroscope's approvability policy. Learn more.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7a670c50a4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-21T10:30:55Z

-        # restore the prior patch state. Never cache a non-offset tokenizer.
+        # This path deliberately loads through vanilla Transformers because
+        # fastokens adaptation is scoped to ``load_tokenizer``'s returned object.
        offset_tok = _load_tokenizer_via_auto(load_name_or_path, **kwargs)


Restore unpatched offset reloads under global fastokens

When the host process has already called fastokens.patch_transformers() (for example because another serving stack enables it globally), this reload is not vanilla: _load_tokenizer_via_auto() returns another fastokens shim, _has_offsets() stays false, and hand-coded renderers that call attribute_text_segments() raise instead of using the offset cache. The deleted fallback used to temporarily unpatch in exactly this case, so this path still needs to force the offset tokenizer reload through unpatched Transformers.

Useful? React with 👍 / 👎.

Scope fastokens adaptation to loaded tokenizers

7a670c5

xeophon mentioned this pull request Jun 21, 2026

Initialize and reuse V1 renderer pools efficiently PrimeIntellect-ai/verifiers#1791

Open

chatgpt-codex-connector Bot reviewed Jun 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scope fastokens adaptation to individual tokenizers#91

Scope fastokens adaptation to individual tokenizers#91
xeophon wants to merge 1 commit into
mainfrom
codex/per-tokenizer-fastokens

xeophon commented Jun 21, 2026 •

edited by macroscopeapp Bot

Loading

Uh oh!

macroscopeapp Bot commented Jun 21, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xeophon commented Jun 21, 2026 • edited by macroscopeapp Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Why

Behavior

Scope fastokens adaptation to individual tokenizers instead of globally patching transformers

Uh oh!

macroscopeapp Bot commented Jun 21, 2026

Approvability

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

xeophon commented Jun 21, 2026 •

edited by macroscopeapp Bot

Loading